# Efficient inference
Diffucoder 7B Cpgrpo 8bit
DiffuCoder-7B-cpGRPO-8bit is a code generation model converted to MLX format, based on apple/DiffuCoder-7B-cpGRPO, and is specifically designed to provide developers with an efficient code generation tool.
Large Language Model Other
D
mlx-community
272
2
ERNIE 4.5 21B A3B PT 8bit
Apache-2.0
ERNIE-4.5-21B-A3B-PT-8bit is an 8-bit quantized version of Baidu's ERNIE-4.5-21B-A3B-PT model, converted to MLX format and suitable for Apple Silicon devices.
Large Language Model Supports Multiple Languages
E
mlx-community
123
1
Huihui Ai.magistral Small 2506 Abliterated GGUF
The Huihui AI Quantized Model is a quantized version of Magistral-Small-2506-abliterated, dedicated to making knowledge accessible to everyone.
Large Language Model
H
DevQuasar
423
1
Slanet Plus
Apache-2.0
SLANet_plus is a model for table structure recognition that can convert non-editable table images into editable table formats (such as HTML). It plays an important role in the table recognition system and can effectively improve the accuracy and efficiency of table recognition.
Text Recognition Supports Multiple Languages
S
PaddlePaddle
1,121
0
Qwen.qwen3 Reranker 0.6B GGUF
The quantized version of Qwen3-Reranker-0.6B, dedicated to making knowledge accessible to everyone.
Large Language Model
Q
DevQuasar
1,481
3
Minicpm4 MCP
Apache-2.0
MiniCPM4-MCP is an open-source edge-side large language model agent model, built on MiniCPM-4 with 8 billion parameters. It can interact with various tools and data resources through MCP to solve a wide range of real-world tasks.
Large Language Model
Transformers Supports Multiple Languages

M
openbmb
367
14
Gemma 3 27b It Quantized.w4a16
This is a quantized version of google/gemma-3-27b-it, supporting visual-text input and text output. Optimized through weight quantization and activation quantization, it enables efficient inference with vLLM.
Image-to-Text
Transformers

G
RedHatAI
302
1
Fpham Sydney Overthinker 13b HF GGUF
This project provides optimized GGUF quantized files, which can significantly improve model performance. These quantized files are supported by Featherless AI. Users can run any desired model by paying a small fee.
Large Language Model
F
featherless-ai-quants
133
1
Deepseek R1 0528 GPTQ Int4 Int8Mix Compact
MIT
The GPTQ quantized version of the DeepSeek-R1-0528 model, using a quantization scheme of Int4 + selective Int8, which reduces the file size while ensuring the generation quality.
Large Language Model
Transformers

D
QuantTrio
258
1
Qwen2 Audio 7B Instruct I1 GGUF
Apache-2.0
Weighted/matrix quantized model of Qwen2-Audio-7B-Instruct, supporting English audio-to-text transcription tasks
Text-to-Audio
Transformers English

Q
mradermacher
282
0
Deepseek R1 0528 Qwen3 8B MLX 4bit
MIT
A large language model developed by DeepSeek AI, optimized with 4-bit quantization, suitable for Apple chip devices.
Large Language Model
D
lmstudio-community
274.40k
1
Deepseek R1 0528 4bit
DeepSeek-R1-0528-4bit is a 4-bit quantized model converted from DeepSeek-R1-0528, optimized for the MLX framework.
Large Language Model
D
mlx-community
157
9
Llm Jp 3.1 1.8b Instruct4
Apache-2.0
A large language model developed by the National Institute of Informatics in Japan, built on LLM-jp-3, and significantly improved the ability to follow instructions through instruction pre-training technology.
Large Language Model
Transformers Supports Multiple Languages

L
llm-jp
165
3
Llm Jp 3.1 1.8b
Apache-2.0
LLM-jp-3.1-1.8b is a large language model developed by the National Institute of Informatics in Japan. Based on the LLM-jp-3 series, it incorporates instruction pre-training to enhance the instruction-following ability.
Large Language Model
Transformers Supports Multiple Languages

L
llm-jp
572
1
Dmindai.dmind 1 Mini GGUF
DMind-1-mini is a lightweight text generation model suitable for various natural language processing tasks.
Text Generation
D
DevQuasar
213
1
Bytedance Seed.academic Ds 9B GGUF
This project provides a quantized version of academic-ds-9B, aiming to make knowledge accessible to everyone.
Large Language Model
B
DevQuasar
277
1
Devstral Small 2505 MLX 4bit
Apache-2.0
The Devstral-Small-2505 model developed by mistralai, optimized with MLX 4-bit quantization for Apple Silicon devices.
Large Language Model Supports Multiple Languages
D
lmstudio-community
57.83k
3
Facebook KernelLLM GGUF
Other
KernelLLM is a large language model developed by Facebook. This version is quantized using the llama.cpp tool with imatrix, offering multiple quantization options to suit different hardware requirements.
Large Language Model
F
bartowski
5,151
2
A M Team AM Thinking V1 GGUF
Apache-2.0
Llamacpp imatrix quantized version based on a-m-team/AM-Thinking-v1 model, supporting multiple quantization types, suitable for text generation tasks.
Large Language Model
A
bartowski
671
1
Thedrummer Snowpiercer 15B V1 GGUF
MIT
A quantized version based on TheDrummer/Snowpiercer-15B-v1 model, using llama.cpp for quantization, suitable for text generation tasks.
Large Language Model
T
bartowski
4,783
1
Mellum 4b Sft Rust GGUF
Apache-2.0
A large language model fine-tuned specifically for Rust code middle infilling (FIM) tasks, built upon JetBrains/Mellum-4b-base
Large Language Model Supports Multiple Languages
M
Etherll
389
1
Qwen3 30B A3B 4bit DWQ
Apache-2.0
This is a 4-bit quantized version based on the Qwen3-30B-A3B model, created through custom DWQ quantization technology distilled from 6-bit to 4-bit, suitable for text generation tasks.
Large Language Model
Q
mlx-community
561
19
Qwen3 30B A3B FP8 Dynamic
Apache-2.0
Qwen3-30B-A3B-FP8-dynamic is an FP8 quantized version of the Qwen3-30B-A3B model, significantly reducing memory requirements and computational costs while maintaining the high accuracy of the original model.
Large Language Model
Transformers

Q
RedHatAI
187
2
Qwen3 8B FP8 Dynamic
Apache-2.0
Qwen3-8B-FP8-dynamic is an optimized version of the Qwen3-8B model through FP8 quantization, significantly reducing GPU memory requirements and disk space usage while maintaining the original model's performance.
Large Language Model
Transformers

Q
RedHatAI
81
1
Industry Project V2
Apache-2.0
An instruction fine-tuned model optimized based on the Mistral architecture, suitable for zero-shot classification tasks
Large Language Model
I
omsh97
58
0
Qwen3 8B GGUF
MIT
ZeroWw is a quantized text generation model that uses f16 format for output and embedding tensors, while other tensors use q5_k or q6_k format, resulting in a smaller size with performance comparable to pure f16.
Large Language Model English
Q
ZeroWw
236
1
Qwen3 4B GGUF
MIT
A quantized text generation model with output and embedding tensors in f16 format, while other tensors use q5_k or q6_k quantization, resulting in a smaller size with performance comparable to the pure f16 version.
Large Language Model English
Q
ZeroWw
495
2
Qwen3 8B Base
Apache-2.0
Qwen3-8B-Base is the latest generation of Tongyi's large model series, with 8.2 billion parameters and support for 119 languages. It is suitable for a variety of natural language processing tasks.
Large Language Model
Transformers

Q
unsloth
5,403
1
Qwen3 0.6B Base Unsloth Bnb 4bit
Apache-2.0
Qwen3-0.6B-Base is the latest generation of large language models in the Tongyi series. It has a parameter scale of 0.6B, supports 119 languages, and has a context length of up to 32,768 tokens.
Large Language Model
Transformers

Q
unsloth
10.84k
1
Internvl2 5 1B MNN
Apache-2.0
A 4-bit quantized version based on InternVL2_5-1B, suitable for text generation and chat scenarios.
Large Language Model English
I
taobao-mnn
2,718
1
GLM Z1 32B 0414 4bit
MIT
This model is a 4-bit quantized version converted from THUDM/GLM-Z1-32B-0414, suitable for text generation tasks.
Large Language Model Supports Multiple Languages
G
mlx-community
225
2
OPENCLIP SigLIP Tiny 14 Distill SigLIP 400m Cc9m
MIT
A lightweight vision-language model based on the SigLIP architecture, extracting knowledge from the larger SigLIP-400m model through distillation techniques, suitable for zero-shot image classification tasks.
Image Classification
O
PumeTu
30
0
Deepseek R1 Quantized.w4a16
MIT
INT4 weight-quantized version of DeepSeek-R1, reducing GPU memory and disk space requirements by approximately 50% while maintaining original model performance.
Large Language Model
D
RedHatAI
119
4
Falcon E 3B Base
Other
Falcon-E is a 1.58-bit quantized language model developed by TII, featuring a pure Transformer architecture designed for efficient inference
Large Language Model
Transformers

F
tiiuae
51
6
Bitnet B1.58 2B 4T Gguf
MIT
The first open-source, native 1-bit large language model developed by Microsoft Research, with a parameter scale of 2 billion, trained on a corpus of 4 trillion tokens.
Large Language Model English
B
microsoft
25.77k
143
Bitnet B1.58 2B 4T
MIT
The first open-source 2-billion-parameter native 1-bit large language model developed by Microsoft Research, trained on 4 trillion tokens, demonstrating that native 1-bit large language models can significantly improve computational efficiency while maintaining performance comparable to full-precision open-source models of the same scale.
Large Language Model
Transformers English

B
microsoft
35.87k
846
Bitnet B1.58 2B 4T Bf16
MIT
An open-source native 1-bit large language model developed by Microsoft Research, with 2 billion parameters trained on a 4 trillion token corpus, significantly improving computational efficiency.
Large Language Model
Transformers English

B
microsoft
2,968
24
Moderncamembert Base
MIT
ModernCamemBERT is a French language model pre-trained on a 1T high-quality French text corpus. It is the French version of ModernBERT, focusing on long contexts and efficient inference speed.
Large Language Model
Transformers French

M
almanach
213
4
Vit Base16 Fine Tuned Crop Disease Model
This is a transformers model hosted on Hugging Face Hub, with no specific functionality explicitly stated.
Large Language Model
Transformers

V
sabari15
179
2
Mtmme Merge Gemma 2 9B NuSLERP W0.7 0.3
A variant of the Gemma-2B model fused using the SLERP method, combining two different weighted versions of the Gemma-2B model
Large Language Model
Transformers

M
zelk12
16
2
- 1
- 2
- 3
- 4
- 5
Featured Recommended AI Models